Skip to content

[lightning-net-tokio] Fix race-y unwrap fetching peer socket address #1449

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

TheBlueMatt
Copy link
Collaborator

I recently saw the following panic on one of my test nodes:

thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()`
    on an `Err` value: Os { code: 107, kind: NotConnected, message:
    "Transport endpoint is not connected" }',
    rust-lightning/lightning-net-tokio/src/lib.rs:250:38

Presumably what happened is somehow the connection was closed in
between us accepting it and us going to start processing it. While
this is a somewhat surprising race, its clearly reachable. The fix
proposed here is quite trivial - simply don't unwrap trying to
fetch our peer's socket address, instead treat the peer address as
None and discover the disconnection later when we go to read.

I recently saw the following panic on one of my test nodes:

```
thread 'tokio-runtime-worker' panicked at 'called `Result::unwrap()`
    on an `Err` value: Os { code: 107, kind: NotConnected, message:
    "Transport endpoint is not connected" }',
    rust-lightning/lightning-net-tokio/src/lib.rs:250:38
```

Presumably what happened is somehow the connection was closed in
between us accepting it and us going to start processing it. While
this is a somewhat surprising race, its clearly reachable. The fix
proposed here is quite trivial - simply don't `unwrap` trying to
fetch our peer's socket address, instead treat the peer address as
`None` and discover the disconnection later when we go to read.
@codecov-commenter
Copy link

codecov-commenter commented Apr 24, 2022

Codecov Report

Merging #1449 (373dfcc) into main (637fb88) will increase coverage by 0.74%.
The diff coverage is 70.49%.

❗ Current head 373dfcc differs from pull request most recent head 3f22d81. Consider uploading reports for the commit 3f22d81 to get more accurate results

@@            Coverage Diff             @@
##             main    #1449      +/-   ##
==========================================
+ Coverage   90.88%   91.63%   +0.74%     
==========================================
  Files          75       75              
  Lines       41474    46120    +4646     
  Branches    41474    46120    +4646     
==========================================
+ Hits        37695    42260    +4565     
- Misses       3779     3860      +81     
Impacted Files Coverage Δ
lightning-net-tokio/src/lib.rs 75.78% <70.49%> (-0.11%) ⬇️
lightning/src/chain/mod.rs 59.25% <0.00%> (-1.86%) ⬇️
lightning/src/routing/scoring.rs 94.00% <0.00%> (-0.36%) ⬇️
lightning-background-processor/src/lib.rs 95.11% <0.00%> (-0.11%) ⬇️
lightning/src/ln/functional_tests.rs 97.13% <0.00%> (-0.02%) ⬇️
lightning/src/chain/chainmonitor.rs 98.26% <0.00%> (+0.61%) ⬆️
lightning/src/chain/channelmonitor.rs 91.88% <0.00%> (+0.83%) ⬆️
lightning/src/routing/router.rs 93.87% <0.00%> (+1.28%) ⬆️
lightning-persister/src/util.rs 98.87% <0.00%> (+2.82%) ⬆️
lightning/src/ln/channelmanager.rs 87.78% <0.00%> (+3.04%) ⬆️
... and 3 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 637fb88...3f22d81. Read the comment docs.

Copy link
Contributor

@vincenzopalazzo vincenzopalazzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah! good refactoring! unwrap() sometimes is messy into help catching crash

tnull
tnull previously approved these changes Apr 25, 2022
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

arik-so
arik-so previously approved these changes Apr 25, 2022
Copy link
Contributor

@arik-so arik-so left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't imagine there's any reasonable way to add a unit test simulating this exact scenario, considering you'd need an actual TCP connection?

@TheBlueMatt TheBlueMatt dismissed stale reviews from arik-so, tnull, and vincenzopalazzo via 9490393 April 25, 2022 18:40
@TheBlueMatt
Copy link
Collaborator Author

add a unit test

Hmmm, yea, I tried, wrote a test, but it didn't reproduce the issue. I still committed it because maybe there's a world where it does manage to reproduce the issue on some hosts, dunno.

arik-so
arik-so previously approved these changes Apr 27, 2022
Copy link
Contributor

@tnull tnull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two minor comments.

@TheBlueMatt TheBlueMatt dismissed stale reviews from vincenzopalazzo and arik-so via 464175e April 28, 2022 14:50
dunxen
dunxen previously approved these changes Apr 28, 2022
Copy link
Contributor

@dunxen dunxen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ACK 464175e

tnull
tnull previously approved these changes May 2, 2022
Sadly this does not reproduce the issue fixed in the previous
commit.
@TheBlueMatt
Copy link
Collaborator Author

Squashed fixup commits.

Comment on lines +650 to +651
// This attempts to find other similar races by opening connections and shutting them down
// while connecting. Sadly in testing this did *not* reproduce the previous issue.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does the "and shutting them down while connecting" happen?

Copy link
Collaborator Author

@TheBlueMatt TheBlueMatt May 4, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We drop the other TcpStream, which should close it.

@TheBlueMatt TheBlueMatt merged commit 132b072 into lightningdevkit:main May 4, 2022
@TheBlueMatt TheBlueMatt mentioned this pull request May 6, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants